# efficient inference
Phi 4 Mini Instruct Float8dq
MIT
The Phi-4-mini-instruct model undergoes float8 dynamic activation and weight quantization via torchao, achieving 36% VRAM reduction and 15-20% speed improvement on H100 with minimal accuracy impact.
Large Language Model
Transformers Other

P
pytorch
1,006
1
Mistral Small 3.1 24B Instruct 2503 GPTQ 4b 128g
Apache-2.0
This model is an INT4 quantized version of Mistral-Small-3.1-24B-Instruct-2503, using the GPTQ algorithm to reduce weights from 16-bit to 4-bit, significantly decreasing disk size and GPU memory requirements.
Large Language Model
M
ISTA-DASLab
21.89k
13
Omost Phi 3 Mini 128k 8bits
Omost's phi-3-mini model with 128k context length, utilizing fp8 precision.
Large Language Model
Transformers

O
lllyasviel
47
7
Omost Llama 3 8b 4bits
Omost's released llama-3 model, featuring 8k context length and nf4 quantization.
Large Language Model
Transformers

O
lllyasviel
1,163
21
Featured Recommended AI Models